13 May, 2020

Introduction

South Korea during COVID-19

  • One of the world’s most densely populated countries

  • 51.64 million inhabitants

  • First case of COVID-19 confirmed on the 20th of January 2020

  • 259 deaths caused by COVID-19

(South Korean Map, United States. Department Of State)

(South Korean Map, United States. Department Of State)

Research questions

  • How has the epidemic evolved in South Korea?

  • How is human behaviour driving the spread of the disease?

  • Is there any correlation between the place of infection and severity of the disease?

  • Does any gender or age predispose for getting the disease or for a more severe outcome?

  • Can characteristic city features be used to predict the burden of disease?

Materials and methods

Workflow and Structure of the project

Reproducibility

  • The project includes all steps in the data analysis
  • To achieve consistent computational results

Data cleaning

  • Remove non valid data (NA’s)

  • Remove non necessary columns.

  • Converting data into the tidy format:

    • Each variable has a column

    • Each observation has its own row

    • Each value has its own cell

Data augmenting

  • Joining dataset tables using full_join

  • Subsetting data

  • Combining columns using unite

  • Creating new variables for the analysis

Final datasets

  • Case data ( Case )

  • Patient data (Patient info + Patient route)

  • Time data (Time + Time age + Time gender + Time province + SearchTrend)

  • City data (region + Patient info)

Results

How has the epidemic evolved in South Korea?

How has the epidemic evolved in South Korea?

How is human behaviour driving the spread of the disease?

How is human behaviour driving the spread of the disease?

Is there any correlation between the place of infection and severity of the disease?

Does any gender or age predispose for getting the disease or for a more severe outcome?

Does any gender or age predispose for getting the disease or for a more severe outcome?

Can characteristic city features be used to predict the burden of disease?

score_org score_pca
42.5% 49.6%

Can characteristic city features be used to predict the burden of disease?

ANN Network
accuracy
39.5%

Shiny app

Conclusion and discussion

  • There’s no correlation between the place of infection and severity of the disease.

  • More females are diagnosed with COVID-19, but more males die from the disease.

  • Young people are driving the spread.

  • People in their 70s and 80s have a higher fatality rage.

  • There are clusters of superspreaders of certain age range.

  • ANN and k-means accuracy is between 40-50 % - better than random with 4 classes.

Questions?

Superspreaders

Correlation matrix

PCA Variance explained

Regional cases plot

Search trends and confirmed cases